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Abstract 

The extended Kalman filter (EKF) has been applied to inferring gene regulatory networks. However, it is well known 
that the EKF becomes less accurate when the system exhibits high nonlinearity. In addition, certain prior information 
about the gene regulatory network exists in practice, and no systematic approach has been developed to incorporate 
such prior information into the Kalman-type filter for inferring the structure of the gene regulatory network. In this 
paper, an inference framework based on point-based Gaussian approximation filters that can exploit the prior 
information is developed to solve the gene regulatory network inference problem. Different point-based Gaussian 
approximation filters, including the unscented Kalman filter (UKF), the third-degree cubature Kalman filter (CKF3), and 
the fifth-degree cubature Kalman filter (CKF5) are employed. Several types of network prior information, including the 
existing network structure information, sparsity assumption, and the range constraint of parameters, are considered, 
and the corresponding filters incorporating the prior information are developed. Experiments on a synthetic network 
of eight genes and the yeast protein synthesis network of five genes are carried out to demonstrate the performance 
of the proposed framework. The results show that the proposed methods provide more accurate inference results 
than existing methods, such as the EKF and the traditional UKF. 

Keywords: Gene regulatory network; Point-based Gaussian approximation filters; Network prior information; Sparsity; 
Iterative thresholding 



1 Introduction 

Inferring gene regulatory network (GRN) has become 
one of the most important missions in system biol- 
ogy. Genome- wide expression data is widely used due 
to the development of several high-throughput experi- 
mental technologies. The gene regulatory network can 
be inferred from a number of gene expression samples 
taken over a period of time. Modeling of GRN is required 
before its structure can be inferred. Common dynamical 
modeling methods of GRN include Bayesian networks [1], 
Boolean networks [2], ordinary differential equations [3], 
state-space models [4,5], and so on. Various approaches 
based on different models have been used to infer the 
network from observed gene expression data, such as the 
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Markov Chain Monte Carlo (MCMC) methods for the 
dynamic Bayesian network model [6] and the ordinary 
differential equation model [7], as well as the Kalman fil- 
tering methods for the state-space model [4,8] and the 
ordinary differential equation model [3]. Some survey 
papers can be found in [9-12]. 

Due to the stochastic' nature of the gene expression, 
the Kalman filtering approach based on the state-space 
model is one of the most competitive methods for infer- 
ring the GRN. The Kalman filter is optimal for linear 
Gaussian systems. However, the GRN is generally highly 
nonlinear. Hence, advanced filtering methods for nonlin- 
ear dynamic systems should be considered. The extended 
Kalman filter (EKF) is probably the most widely used 
nonlinear filter which uses the first-order Taylor series 
expansion to linearize the nonlinear model. However, the 
accuracy of the EKF is low when the system is highly 
nonlinear or contains large uncertainty. The point-based 
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Gaussian approximation filters have been recently pro- 
posed to improve the performance of the EKF, which 
employ various quadrature rules to compute the integrals 
involved in the exact Bayesian estimation. Many filters 
fall into this category, such as the unscented Kalman fil- 
ter (UKF) [13], the Gauss-Hermite quadrature filter [14], 
the cubature Kalman filter (CKF) [15], and the sparse-grid 
quadrature filter [16]. Besides the point-based Gaussian 
approximation filters, the particle filter has drawn much 
attention recently [17]. The particle filter uses random 
particles with weights to represent the probability den- 
sity function (pdf ) in the Bayesian estimation and provides 
better estimation result than the EKF. The main problem 
of the particle filter is that the computational complexity is 
high, and therefore, it is hard to use for high-dimensional 
problems, such as the problem considered in this paper. 

The EKF and the particle filter have been used for 
the inference of GRN [4,8,18]. In this paper, we con- 
sider the point-based Gaussian approximation filters. Our 
main objective is to provide a framework of incorporating 
network prior information into the filters. For example, 
some gene regulations may be known [19] from litera- 
ture and the inference accuracy of GRN can be improved 
by incorporating the known regulations of the GRN [20]. 
Integration of the prior knowledge or constraints with the 
GRN inference algorithm has been introduced to improve 
the inference result. The DNA motif sequence in gene 
promoter regions is incorporated in [21] while modeling 
of transcription factor interactions is incorporated in [22]. 
As mentioned in [20], experimentally determined physi- 
cal interactions can be obtained. In addition, the sparsity 
constraint is frequently used in the inference of the GRN. 
To the best of the authors' knowledge, the most related 
work in incorporating the prior information in Bayesian 
filters is [8] . In that work, rather than directly getting the 
inference results from the filter, an optimization method 
is used. In particular, a cost function is used in which the 
sparsity constraint is enforced. However, the cost func- 
tion in [8] does not consider the uncertainty of the state 
in the filtering. That cost function in fact is not cou- 
pled well with the filtering algorithm. In addition, it did 
not consider other kinds of prior information. In this 
paper, we propose a new framework that incorporates 
the prior information effectively in the filtering algorithm 
by solving a constrained optimization problem. Efficient 
recursive algorithms are provided to solve the associated 
optimization problem. 

The remainder of this paper is organized as follows. 
In Section 2, the modeling of gene regulatory network 
is introduced. The point-based Gaussian approximation 
filters are briefly introduced in Section 3. The proposed 
new filtering framework is described in Section 4. In 
Section 5, experimental results are provided. Finally, con- 
cluding remarks are given in Section 6. 



2 State-space modeling of gene regulatory 
network 

The GRN can be described by a graph in which genes are 
viewed as nodes and edges depict causal relations between 
genes. The structure of GRN reveals the mechanisms of 
biological cells. Analyzing the structure of GRN will pave 
the way for curing various diseases [23]. The learning of 
GRN has drawn much attention recently due to the avail- 
ability the microarray data. By analyzing collected gene 
expression levels over a period of time, one can identify 
various regulatory relations between different genes. To 
facilitate the analysis of the GRN, modeling of GRN is 
required. Different models can be used, such as Bayesian 
networks [1], Boolean networks [2], ordinary differential 
equation [3], and state-space model [4,5]. The state-space 
model has been widely used because it incorporates noise 
and can make use of computationally efficient filtering 
algorithms [5] . Thus, we also use the state-space modeling 
of GRN in this paper. 

Under the discrete-time state-space modeling of the 
gene regulatory networks, the network evolution from 
time k to time k — 1 can be described by 

*k =/(**-i) + n> (1) 

where x k = [xi )k , • • • >Xn,k\ T * s tne state vector an d 
denotes the gene expression level of the /-th gene at time /<. 
/ is a nonlinear function that characterizes the regulatory 
relationship among the genes, v k is the state noise and it 
is assumed to follow a Gaussian distribution with mean 0 
and covariance matrix Q k , i.e., v k ~ A/"(0, Q k ). 

Following [8], we use the following nonlinear function 
in the state Equation (1): 



fix) = Ag(x), 



with 



and 



giipc) = 



_gn(%n) _ 



1 _|_ e -ii t x 



(2) 



(3) 



(4) 



In (2), A is the regulatory coefficient matrix with ay 
denoting the regulation coefficient from gene ; to gene 
i. Note that a positive coefficient ay indicates that gene 
j activates gene i and a negative ay indicates that gene j 
represses gene /. In (4), /x; is a parameter. Note that A and 
fii are unknown parameters. The discrete-time nonlin- 
ear stochastic dynamic system [24] shown in Eqs. (l)-(3) 
have been successfully used to describe the GRN [4,8]. 
Equation (4) is also called Sigmoid function which is fre- 
quently used since it is consistent with the fact that all 
concentrations get saturated at some point in time [25]. 
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The Sigmoid function has been used in modeling GRN to 
verify various methods, such as artificial neural network 
[26], simulated annealing and clustering algorithm [27], 
extended Kalman filter [4], particle filter [8], and Genetic 
programming and Kalman filtering [25]. 

For the measurement model, we consider the following 
general nonlinear observation equation 



and 



y k = h(x k ) + n k , 



(5) 



where /*(•) is some nonlinear function, n k is the mea- 
surement noise, which is assumed to follow the Gaussian 
distribution with mean 0 and covariance matrix R k , 
i.e., n k ~ Af(0,R k ). For example, if the noise corrupted 
expression levels are observed, then h(x) = x. 

3 Network inference using point-based Gaussian 
approximation filters 

3.1 Gaussian approximation filters 

In this section, the framework of point-based Gaussian 
approximation filters for the state-space dynamic model 
is briefly reviewed. We consider the state-space model 
consisting of the state Equation (1) and the measurement 
Equation (5). We denote y k = \y v . . . ,y k ]. 

The optimal Bayesian filter is composed of two steps: 
prediction and filtering. Specifically, given the prior pdf 
p(x k -i\y k ~ l ) at time k — 1, the predicted conditional pdf 
p(x k \y k ~ l ) is given by 



p(x k \y k *) = j P(Xk\x k -i)p(x k -i\/ l H*k- 



(6) 



After the measurement at time k becomes available, the 
filtered pdf is given by 



p(x k \y k ) = 



p(yk\ x k)p(xk\y k *) 
j p(yk\ x k)p(xk\y k ~ l )&xi< ' 



(7) 



The pdf recursions in (6) and (7) are in general com- 
putationally intractable unless the system is linear and 
Gaussian. The Gaussian approximation filters approx- 
imate (6) and (7) by invoking Gaussian assumptions. 
Specifically, the first assumption is that given y k ~ l , 
x k -\ has a Gaussian distribution, i.e., x k -\\y k ~ l ~ 
A/'(^_i|/ r _i,P/:_i|/ : _i). The second assumption is that 
(x k) y k ) are jointly Gaussian given y k ~ l . 

It then follows from the second assumption that given 
y k ~ l , x k has a Gaussian distribution, i.e., x k \y k ~ 1 ~ 
J\f(x k \ k -i, P k \ k -i). Using (1) and the first assumption, we 
have the predicted mean x k \ k -\ and covariance Pk\k-i 
given respectively by 

h\k-i = nx k \y k ~ l ) = E^iyt-i 

f(x)(/)(x; (8) 



P m _ x 4 Covtel/" 1 } 

= E ^_!|/-i {(f(*k-l)-*k\k-i) 

x -^i^-i) r | + Qk 

= / (fix) -x k \k-i) 



x (f(x) -x k \k-\) T ^ (*; x k -i\k-i> 
P k -i\k-i)dx+Q k _ v (9) 

where <p (x; x, P) denotes the multivariate Gaussian pdf 
with mean x and covariance P. 

Then, following the second assumption, given y k = 
[y k ~ 1 ,y k ], x k is Gaussian distributed, i.e., x k \y k ~ 
N(Xk\k>Pk\k)< Using the conditional property of the mul- 
tivariate Gaussian distribution, the filtered mean x k \ k and 
covariance P k \ k are given respectively by 



xk\k = nx k \y k >y k ~ 1 } 

= Xk\k-i+L k {y k -y m _ l ) 
and P k \ k = Covfel^,/" 1 } 
= P klk . 1 -L k Ff, 



(10) 
(11) 



with 



h\k-i = E * t i/-i {*(**)} 

= J h(x)<p(x;x k \ k -i,P k \ k -i)dx, (12) 
L k =Pf(R k +P y k y y\ (13) 
= E **i/-i {(* " h\k-\)(h(x) - y k \k-\) T \ 
= j {x - x k \ k - x ){h{x) - j^_i) T 

^(x-.x^i.P^^dx, (14) 
If = E^-i {(*(*) -^ N )W-^_/) 

(*(*)- -5>^| /t -l) T 

4> (x;x k \ k -i,P k \ k -i) Ax. (15) 



3.2 Point-based Gaussian approximation filters 

The integrals in (8), (9), (12), (14) and (15) are Gaussian 
type that can be efficiently approximated by various 
quadrature methods. Specifically, if a set of weighted 
points {(yi, wf), i = 1, . . . , N] can be used to approximate 
the integral 



f N 

/ h(x) (p (x; 0, /) dx ^ ^ ^ih(yi) 

•* i=i 



(16) 
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then the general Gaussian-type integral can be approxi- 
mated by 

f N 

/ h(x)</> (x; x, P) dx ^ ^ w i h(Sy i + x), (17) 

where P = SS T and S can be obtained by Cholesky 
decomposition or singular value decomposition (SVD). 

Using (17), we can then approximate (8) and (9) as 
follows: 

N 

%-l *I>i/ ($*-!,.') ( 18 ) 



and 



i=l 



N 



Pk\k-1 ~ ^2 W{ f (%k-l,i ~ h\k-\) 



i=l 



x (h-i,i-*k\k-i) T + Qk-v 



(19) 



where § k _ Xi is the transformed quadrature point obtained 
from the covariance decomposition, i.e., 

Pk-i\k-i = Sk-iSl-v (20) 

h-u = S k -iYi+x k -i\k-i- (21) 

Similarly, we can approximate (12), (14) and (15) as 
follows: 

N 

h\k-i = (£*,*)> (22) 

i=l 

N T 

= Y, w ' {hi - %-i) (*(f«) - h\k-i) • (23) 

i=l 

N T 

pyy = (Hhi)-h\k-i) (Hhi)-h\k-i) > (24) 



where is the transformed point obtained from the 
decomposition of the predicted covariance, i.e., 



Pk\k-1 = Sk&k > 

lk,i = Skyi+*k\k-l- 



(25) 
(26) 



Various numerical rules can be used to form the 
approximation in (16), which lead to different Gaussian 
approximation filters. In particular, the unscented 
transformation, the Gauss-Hermite quadrature rule, and 
the sparse-grid quadrature rules are used in the unscented 
Kalman filter (UKF), the Gauss-Hermite quadrature 
Kalman filter (GHQF), and the sparse-grid quadrature 
filter (SGQF), respectively. 

Recently, the fifth-degree quadrature filter has been 
proposed and shown to be more accurate than the third- 
degree quadrature filters, such as the UKF and the third- 
degree cubature Kalman filter (CKF3), when the system is 
highly nonlinear or contains large uncertainty [16]. In this 
paper, we consider the UKF, CKF3, and the fifth-degree 



cubature Kalman filter (CKF5). Other filters such as the 
central difference filter [14] and divided difference filter 
[28] can also be used. The CKF5 is based on Mysovskikhs 
method which uses fewer point than the fifth-degree 
quadrature filter in [16]. In the following, different numer- 
ical rules used in (16) are briefly summarized. 

3.2. 7 Unseen ted transform 

In the unscented Kalman filter (UKF), we have N = 2n + 1 
where n is the dimension of x. The quadrature points and 
the corresponding weights are given respectively by 



Yi = 



and 



0, 



J{n + ic)ei-i, 



- J (n + K)ei- n -i, 



i = l, 

i = 2, • • • , n + 1, 
i = n + 2, • • • , 2n + 1, 



n + k 
1 



1, 



/ = 2, • • • , 2n + 1, 



(27) 



(28) 



2(n + k) 

where k is a tunable parameter, and e; is the /-th n- 
dimensional unit vector in which the /-th element is 1 and 
other elements are 0. 

3.2.2 Cubature rules 

The left-hand side of (16) can be rewritten as 

j h(x) <fi (x; 0,7) dx = — ^ J h (V2xj exp [—x T x^ dx. 

(29) 

Consider the integral 

I(h) = j h(x)exp(-x T x^)dx. (30) 

By letting x = rs with s T s = 1 and r = \l x T x, 1(h) can 
be rewritten in the spherical-radial coordinate system as 

I(h)= t f fc(rs)r w_1 exp (~r 2 ) da (s) dr, (31) 

J0 JUn 

where U n = {s e R n : ||s|| = 1}, and o (•) is the spherical 
surface measure or the area element on U n . 

Note that (31) contains two types of integrals: the radial 
integral / 0 °° h r (r) r w_1 exp (— r 2 ) dr and the spherical inte- 
g™l f Un hs(s)da(s). 

If the radial rule can be approximated by 

/ h r (r)r n ~ l e^ (-r 2 ) dr « T w r ,Mn), (32) 
Jo i=i 
and the spherical integral can be approximated by 



/ h s (s) da (s) w w s>J h s 



(33) 
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then (31) can be approximated by 

N r N s 
i=l ;=1 



(34) 



A third-degree cubature rule to approximate (29) is 
obtained by using the third-degree spherical rule and 
radial rule [15]: 

r l n 

J h(x)</> (x; 0, /) d* « — [ h Wn*i) + * (->/««/)] • 



(35) 

Remark: The third-degree cubature rule is identical to 
the unscented transformation with k = 0. 

To construct the fifth-degree cubature rule, the 
Mysovskikhs method [29] and the moment matching 
method [16] are used to provide the fifth-degree spheri- 
cal rule and radial rule, respectively. The final fifth-degree 
cubature rule is given by 



3.3 Augmented state-space model for network inference 

In the state-space model for gene regulatory networks 
described in Section 3.2, the underlying network structure 
is characterized by the n x n regulatory coefficient matrix 
A in (2) and the parameters ft = [/xi, . . . , in (4). The 
problem of network inference then becomes to estimate A 
and fi. To do that, we incorporate the unknown parame- 
ters A and fi into the state vector to obtain an augmented 
state-space model, and then apply the point-based Gaus- 
sian approximation filters to estimate the space vector and 
thereby obtaining the estimates of A and ft. 

Specifically, we denote 0 = [an,ai2, • • • ,a\ m • • • , 
a<nn> Mi, • • • , t^ n ] T and the augmented state vector x k = 

^x k , 0 r J . Then, the augmented state equation can be 
written as 



=/(**- i) + n = 



Ak-lgk-l( x k-l) 
0 k -i 



+ 



n-i 
o 



(40) 



h(x)4> (x; 0,1) dx ' 



n + 2 



+ 



+ 



+ 



+ 



fc(0)+ 

n 2 {7 - n) 
2(n + l) 2 (n + 2) 2 

h(-V^+2sf)] 



Note that Ak-\ and^ /c _ 1 can be obtained from Ok-i, and 
v k ~ A/"(0, Q k ) with Q k = diag ([Q k O w 2 +w ]), where O m 
denotes an m x m all-zero matrix. 

_ In the remainder of this paper, we assume that the 

noisy gene expression levels are observed. Therefore, the 
augmented measurement equation becomes 



i=l 



y k = h(x k ) + n k = Bx k + n k , 



(41) 



2(n - 1) 



2 «(«+l)/2 



(n + l) 2 (n + 2) 2 
*(-V^+2s®)] : 



where the point sf* is given by 



o(0 



[h(^+2s?) where B = [/„,0„ x 

{n 2 +ri) ]» ^x(w 2 +w) denotes an n x 
i=i (n 2 + all zeros matrix. 

The point-based Gaussian approximation filters can 
then be used to obtain the estimate of the augmented 
state, from which the estimates of the unknown net- 
work parameters, i.e., A and /t can then be obtained. 

Note that since the measurement Equation (41) is linear, 
the filtering Equations (10, 11) become 



(36) 



i = 1,2,-.. ,n+l, (37) 



with 



(0 
Pi = 



n + 1 



n(n-j + 2)(n-j+l) 



(n + l)(n-i+l) 
n(n - / + 2) 9 



0, 



J = h 

j > L 



(38) 



Moreover, the set of points {s^ } is given by 

2,-.. ,n + lj. (39) 



*k\k = *k\k-i + Lk(y k - Bx k \k-i), (42) 
and P k \k = P k \k-i -L k BP k \k-i, (43) 
with L k = P k \ k _iB T (R k + BPk\k-\B T )~ l > (44) 

which are the same as the filtering updates for Kalman 
filters. 

4 Incorporating prior information 

In practice, some prior knowledge on the underlying 
GRN is typically available. In this section, we outline 
approaches to incorporating such prior knowledge into 
the point-based Gaussian approximation filters for net- 
work inference. In particular, we consider two types of 
prior information, namely, sparsity constraints and range 
constraints on the network. For networks with spar- 
sity constraints, we incorporate an iterative threshold- 
ing procedure into the Gaussian approximation filters. 
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And to accommodate range constraints, we employ PDF- 
truncated Gaussian approximation filters. 

4.1 Optimization-based approach for sparsity constraints 
4.1.1 The optimization formulations 

Note that under the Gaussian assumption, the state esti- 
mation %k\k of the Kalman filter is equivalently given by 
the solution to the following optimization problem [30,31] 

x k \ k = arg min J(x), (45) 

X 

with /(*) 4 (y k - Hx)) 7 !*- 1 (y k - h(x)) 

+ (x- **|it-l) P k\k-l (* - **|*-i) • 

(46) 

To incorporate the prior information of the GRN, (46) is 
modified as 



J(x)=J(x)+XJ p (x), 



(47) 



where } p (x) is a penalty function associated with the prior 
information and A. is a tunable parameter that regulates 
the tightness of the penalty. 

For example, in gene regulatory networks, each gene 
only interacts with a few genes [20]. To capture such a 
sparsity constraint, a Laplace prior distribution can be 
used for the connection coefficient matrix A, i.e., 



p(A) = (X/2) n2 exp I j • 



(48) 



Therefore, in this case, J p (x) = — \ogp(A) =c\ \\A\\ \ + C2 
where c\ and c<i are constants. And, (47) can be rewritten 
as 



J(x)=J(x)+X\\Ah 



(49) 



Note that (49) can also be interpreted as the result of 
applying the least squares shrinkage selection operator 
(LASSO) to (47). The LASSO adds an Li-norm constraint 
to the GRN so that the regulatory coefficient matrix A 
tends to be sparse with many zero elements. 

As another example, if some known regulatory rela- 
tionship exists, then it should be taken into account to 
improve the estimation accuracy. Specifically, define an 
n x n indicator matrix E =[e^-] where ey = 1 indicates that 
there is a lack of regulation from gene ; to gene L Then, 
similar to the use of LASSO, a penalty on ay should incur 
if etj = 1. Thus, (47) can be rewritten as 



m=m + k\\EoA\\ 1 . 



(50) 



Note that as in [20], here we do not force ay = 0 cor- 
responding to ey = 1 but rather use an L\ -norm penalty. 
The advantage of such an approach is that it allows the 
algorithm to pick different structures but more likely to 



pick the edges without penalties. ( o denotes the entry- 
wise product operation of two matrices. 

4. 1 .2 Iterative thresholding algorithm 

Solving the optimization problems in (49) and (50) is not 
straightforward since \a\ is non-differentiable at a = 0. 
In the following, an efficient solver called the iterative 
thresholding algorithm is introduced. 

For convenience, we consider a general optimization 
problem of the form 



arg min J(x) = L(x) + ox\\\, 



(51) 



where X = [A 1,^2, ••• , X n 2+ 2n \ T andZ,(#) is a smooth func- 
tion. Note that if X = [0i xn ,X x l lxw 2, 0\ xn ] T , then (51) 
becomes (49); and if X = [0i xn ,X x 

^,0i x „] r , then (51) 
becomes (50). Note that £ = [en, e\i, • • • , e\ n , • • • , e nn ] T . 

The solution to (51) can be iteratively obtained by solv- 
ing a sequence of optimization problems. As in Newton's 
method, the Taylor series expansion of L(x) around the 
solution x l at the £-th iteration is given by 

L& + A*) = L(&) + A* r VL(^) + y \\Ax\\j, (52) 

where VX is the gradient of L and a t is such that a t I 
mimics the Hessian V 2 L. Then, x M is given by [32] 

x M = arg min (z-^) r VI(^) + — ||z-^||^ + ||Xoz||i. 
z 2 

(53) 

The equivalent form of (53) is given by [32] 



zt+l 



with u l = 



at 
j 



arg min -\\z — u 
z 2 

x l - — VL(A 
at 

llc*l|2 ' 



VL(^) - VIC^- 1 ). 



t "2 + -||Xoz||i, (54) 
ott 



The solution to (54) is given by [32] x t+l 
where 

r] S (u,a) = sign(^)max{|^| — a, 0} 



(55) 

(56) 

(57) 
(58) 



V S (u l , £), 



(59) 



is the soft thresholding function with sign(w) and 
max { \u\ — a, 0} being component- wise operators. 

Finally, the iterative procedure for solving (51) is given 
by 



x t+l = sign 



| \jf - —VLix*) 






11 a t 


ott J 



(60) 
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And the iteration stops when the following condition is 
met 



< e, 



where 6 is a given small number. 



(61) 



4.2 PDF truncation method for range constraints 

If the range constraints on the regulatory coefficients 
are available, the inference accuracy can be improved by 
enforcing the constraints in the Gaussian approximation 
filters. 

In particular, assume that we impose the following range 
constraints on the state vector x 



c < x < d. 



(62) 



The PDF truncation method [31] can be employed to 
incorporate the above range constraint into the Gaussian 
approximation filters, by converting the updated mean 
Xk\k and covariance Pk\k to the pseudo mean and 
covariance P^ k which are then used in the next prediction 
and filtering steps. 

We next briefly outline the PDF truncation procedure. 
We use . and t - to denote the mean and covariance 
after the first i constraints have been enforced. Initially, we 
set#]L 0 = Xk\k andP£|£ 0 = Pk\k< Consider the following 
transformation 



z k ,i = GiV i 1/2 ST(x k -^ k{k .) 



(63) 



where Si and V[ are obtained from the Jordan canoni- 
cal decomposition S{D{Sj — P^ k . and Qi is obtained by 
using the Gram-Schmidt orthogonalization and it satisfies 
[33] 



Q{D) I2 S] 



1/2 



,<>,••• ,0 



(64) 



Then, the upper bound efx < di is transformed to [33] 



[i,o,---,o]^<-r- 



di e I x k\k,i 



d{. 



(65) 



Similarly, the lower bound efx > c; is transformed to 

T-t 

[l,0,---,0]z^> XJ 1 *^, 2 =~ c i- ( 66 ) 

The constraint requires that the first element of Zk,i lies 
between c, and d,. Hence, only the truncated PDF of the 
first element of Zk,i is considered and it is given by [33] 

f(z) = ai exp(-z 2 /2), (67) 
V2 



with oti — 



(68) 



V^[erf(^/V2) - erf(c,/V2)] 

Then, the mean and variance of the first element of Zk,i 
after imposing the i-th constraint are given respectively by 

M=J_' zf{z)dz = m [exp(-c?/2) - exp(-d?/2)] , 

(69) 

af = j d \z- IM ) 2 f{z)dz 

J'Ci 

= «i[exp(-c 2 /2)(Cj - 2(j, t ) 

- exp(-df/2)(di - 2 W )] + \A + 1. (70) 

Thus, the mean and covariance of the transformed state 
vector after imposing the i-th constraint are given respec- 
tively by 

~z kli = lm,o,--- ,o] T , (7i) 

Q k>i = diag([of,l,-- ,1]). (72) 
By taking the inverse transform of (63), we then get 



(73) 



x k\k,l+\ — S&i Gl z k,i + x k\k,i' 

*W = S{D\l 2 Gjd k fi{D\l 2 Sj. (74) 

After imposing all n constraints, the final constrained 
state estimate and covariance at time k are given respec- 
tively by % m 4 %^ kn andP^ 4 P^. 

5 Numerical results 

5.1 Synthetic network 

In this section, a synthetic network that contains eight 
genes is used to test the performance of the EKF, the 



Table 1 Comparison of UKF with different k 







True positive rate 






False positive rate 




Positive predictive rate 




Filters 


Min 


Max 


Avg 


Min 


Max 


Avg 


Min 


Max 


Avg 


UKF(/c = -5) 


0.7576 


0.9355 


0.8472 


0.5000 


0.7647 


0.5955 


0.5094 


0.6279 


0.5824 


UKF(/c = -2) 


0.7576 


0.9355 


0.8406 


0.5161 


0.7647 


0.5933 


0.5094 


0.6279 


0.5825 


UKF(/c = 0) 


0.7576 


0.9375 


0.8426 


0.5161 


0.7647 


0.5918 


0.5094 


0.6364 


0.5840 


UKF(/c = 2) 


0.7576 


0.9375 


0.8407 


0.5152 


0.7353 


0.5895 


0.5098 


0.6279 


0.5841 


UKF(/c = 5) 


0.7576 


0.9063 


0.8394 


0.5161 


0.7353 


0.5933 


0.5192 


0.6279 


0.5821 



Table 2 Comparison of different filters 







True positives # 




False positives # 






True negatives # 






False negatives # 




Filters 


Min 


Max 


Avg Min 


Max 


Avg 


Min 


Max 


Avg 


Min 


Max 


Avg 


EKF 


2 


17 


1 0.60 23 


44 


36.4 


2 


15 


7.08 


2 


24 


9.92 


UKF 


25 


29 


26.80 16 


26 


19.28 


8 


16 


13.06 


2 


8 


4.86 


CKF 3 


25 


30 


26.74 16 


26 


19.10 


8 


15 


13.14 


2 


8 


5.02 


CKF 5 


25 


29 


26.64 16 


26 


19.24 


8 


16 


13.08 


1 


8 


5.04 








True positive rate 








False positive rate 






Positive predictive rate 




Filters 




Min 


Max 


Avg 




Min 


Max 


Avg 


Min 


Max 


Avg 


EKF 




0.0769 


0.8667 


0.5224 




0.6053 


0.9545 


0.8358 


0.0800 


0.3208 


0.2231 


UKF 




0.7576 


0.9355 


0.8472 




0.5 


0.7576 


0.5955 


0.5094 


0.6279 


0.5824 


CKF 3 




0.7576 


0.9375 


0.8426 




0.5161 


0.7647 


0.5918 


0.5094 


0.6364 


0.5840 


CKF 5 




0.7576 


0.9667 


0.8417 




0.5000 


0.7647 


0.5946 


0.5094 


0.6279 


0.5814 



Table 3 Inferred results of the conventional filter and filters incorporating the prior information 







True positives # 




False positives # 






True negatives # 






False negatives # 




Filters 


Min 


Max 


Avg Min 


Max 


Avg 


Min 


Max 


Avg 


Min 


Max 


Avg 


UKF 


25 


29 


26.80 16 


26 


19.28 


8 


16 


13.06 


2 


8 


4.86 


UKF p1 


25 


29 


27.34 14 


19 


16.52 


13 


18 


15.72 


2 


8 


4.42 


UKF p2 


23 


26 


24.16 13 


16 


13.86 


16 


18 


17.20 


7 


10 


8.78 


UKF p3 


25 


29 


26.70 12 


24 


17.50 


9 


19 


14.50 


3 


8 


5.30 








True positive rate 








False positive rate 






Positive predictive rate 




Filters 




Min 


Max 


Avg 




Min 


Max 


Avg 


Min 


Max 


Avg 


UKF 




0.7576 


0.9355 


0.8472 




0.5 


0.7647 


0.5955 


0.5094 


0.6279 


0.5824 


UKF p1 




0.7576 


0.9355 


0.8614 




0.4375 


0.5935 


0.5121 


0.5778 


0.6744 


0.6239 


UKF p2 




0.6970 


0.7879 


0.7335 




0.4194 


0.5000 


0.4462 


0.5897 


0.6667 


0.6355 


UKF p3 




0.7576 


0.9063 


0.8348 




0.3871 


0.7273 


0.5463 


0.5294 


0.6923 


0.6049 
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Table 4 Comparison of UKF^i using different X 









True positive rate 






False positive rate 




Positive predictive rate 


Filters 




Min 


Max 


Avg 


Min 


Max 


Avg 


Min 


Max 


Avg 


UKF p1 (A, 


= 0.1) 


0.7576 


0.9355 


0.8484 


0.5000 


0.7647 


0.5900 


0.5094 


0.6279 


0.5850 


UKF p1 (X 


= 0.5) 


0.7576 


0.9677 


0.8535 


0.4688 


0.7647 


0.5696 


0.5094 


0.6512 


0.5948 


UKF p1 (X 


= 1) 


0.7576 


0.9355 


0.8614 


0.4375 


0.5935 


0.5121 


0.5778 


0.6744 


0.6239 


UKF p1 (X 


= 5) 


0.7500 


0.9355 


0.8439 


0.3548 


0.5455 


0.4672 


0.5814 


0.7105 


0.6456 



UKF p i(A = 10) 0.7273 0.9063 0.8217 0.3226 0.4848 0.4156 0.6190 0.7368 0.6695 



UKF, the CKF3, the CKF5, and their corresponding fil- 
ters incorporating the prior information. Forty data points 
are collected to infer the structure of the network. The 
system noise and measurement noise are assumed to be 
Gaussian distributed with means 0 and covariances Q k = 
diag([0.01/ 8 0 72 ]) and/?/, = 0.01/ 8 , respectively. The 
connection coefficient matrix is given by 



A = 



/ 


0 


0 


0 


0 


0 


0 


2.4 


3.2 


\ 




0 


0 


0 


4.1 


0 


-2.4 


0 


4.1 






-5.0 


2.1 


-1.5 


0 


4.5 


0 


2.1 


0 






0 


1.3 


2.5 


-3.7 


1.8 


0 


0 


-3.1 






0 


0 


0 


-2.6 


-3.2 


0 


-1 


4 






-1.5 


-1.8 


0 


3.4 


1.4 


1.1 


0 


1.7 






-1.8 


0 


0 


-3 


1.1 


2.4 


0 


0 




V 


-1.3 


0 


-1 


0 


2.1 


0 


0 


2.2 


/ 



(75) 



and /Jit = 2, i = 1, • • • , 8. For the filter, each coefficient 
in A is initialized from a Gaussian distribution with mean 
0 and variance 0.2. Moreover, the coefficient \i{ is ini- 
tialized from a Gaussian distribution with mean 1.5 and 
variance 0.2. The system state is initialized using the first 
measurement. 

The metric used to evaluate the inferred GRN is the true 
positive rate (TPR), the false positive rate (FPR), and the 
positive predictive value (PPV). They are given by [34] 

TP# 

TPR = — , (76) 



FPR = 



PPV = 



TP# + FN# 

FP# 
FP# + TN#' 

TP# 
TP# + FP#' 



(77) 
(78) 



where the number of true positives (TP#) denotes the 
number of links correctly predicted by the inference algo- 
rithm; the number of false positives (FP#) denotes the 
number of incorrectly predicted links; the number of true 
negatives (TN#) denotes the number of correctly pre- 
dicted nonlinks; and the number of false negatives (FN#) 
denotes the number of missed links by the inference 
algorithm [8]. 

5.1.1 Comparison of the EKF with poin t-based Gaussian 
approximation filters 

The UKF with different parameter k is tested. The sim- 
ulation results based on 50 Monte Carlo runs are shown 
in Table 1. It can be seen that UKFs with k = 0, 2, 5 have 
slight better performance than UKFs with k = —5,-2. 
One possible reason is that the weights of all sigma points 
used in the UKF are all positive when k > 0. In gen- 
eral, all positive weights will guarantee better stability of 
the filtering algorithm. However, it should be emphasized 
that, in this specific example, there is no big difference 
between UKFs with different /c. In addition, the objective 
of this paper was to investigate the proposed filter incor- 
porating the prior information. Hence, the UKF is used to 
denote UKF with k = 3 — n and compare with the filters 
incorporating the prior information. 

The inference results of the EKF, the UKF, the CKF3, 
and the CKF5 are summarized in Table 2, all results are 
based on 50 Monte Carlo runs. It can be seen that all 
point-based Gaussian approximation filters have better 
performance than the EKF since the average (avg) FPR is 
lower and the average TPR and precision are higher than 
that of the EKF. Although the CKFs exhibit slightly better 



Table 5 Effect of strength of the links using different X 









True positive rate 






False positive rate 




Positive predictive rate 


Filters 




Min 


Max 


Avg 


Min 


Max 


Avg 


Min 


Max 


Avg 


UKFp] {X = 


0.1) 


0.7576 


0.9677 


0.8484 


0.4688 


0.7647 


0.5713 


0.5094 


0.6512 


0.5929 


UKFp 1 (A = 


0.5) 


0.7576 


0.9333 


0.8468 


0.4516 


0.7059 


0.5422 


0.5385 


0.6512 


0.6057 


UKFp 1 (X = 


1) 


0.7500 


0.9032 


0.8221 


0.3750 


0.5758 


0.4953 


0.5814 


0.6842 


0.6257 


UKFp] {X = 


5) 


0.7273 


0.8750 


0.8220 


0.3548 


0.5000 


0.4169 


0.6098 


0.7179 


0.6684 


UKFpT(X = 


10) 


0.7500 


0.8750 


0.8214 


0.3226 


0.5000 


0.4143 


0.6098 


0.7368 


0.6696 
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Table 6 Effect of false prior information using different X 



True positive rate False positive rate Positive predictive rate 



Filters 


Min 


Max 


Avg 


Min 


Max 


Avg 


Min 


Max 


Avg 


UKFp l (A = 0.1) 


0.7576 


0.9667 


0.8491 


0.5000 


0.7647 


0.5933 


0.5094 


0.6279 


0.5835 


UKFp 1 (A = 0.5) 


0.7576 


0.9355 


0.8535 


0.4839 


0.7647 


0.5962 


0.5094 


0.6279 


0.5836 


UKFp 1 (A = 1) 


0.7576 


0.9333 


0.8572 


0.4839 


0.7059 


0.6001 


0.5200 


0.6279 


0.5830 


UKF^ft. = 5) 


0.6970 


0.8125 


0.7546 


0.4194 


0.5938 


0.5000 


0.5682 


0.6486 


0.6062 


UKF^ft. = 10) 


0.5758 


0.7576 


0.6810 


0.3226 


0.5000 


0.4066 


0.5676 


0.7059 


0.6369 



filtering performance than the UKF in some runs, they are 
comparable in terms of TPR, FPR, and PPV. 

Based on the above tests, in the rest of the paper, only 
the UKF is used. 

5. 7 .2 Comparison of the UKF and the UKF incorporating the 
prior information 

As mentioned above, the UKF is used as a typical filter 
to compare the performance with and without the prior 
information. 

Incorporating existing network information The fol- 
lowing prior existing network information is assumed to 
be known: 1) genel, gene5, and gene7 have little possi- 
bility to regulate gene2; 2) gene2, gene3, gene8 have little 




Figure 1 Inferred regulations of UKF p2 and true regulations. 



possibility to regulate gene7. Hence, the indicator matrix 
in (50) is given by 



/0 0 0 0 0 0 0 0\ 
10001010 
00000000 
00000000 
00000000 
00000000 
0 1 1 0 0 0 0 1 

\0 0000000/ 



(79) 



The comparison of the UKF and the UKF incorporat- 
ing the existing network information (denoted by UKF^i) 
with k = 2 is shown in Table 3. It can be seen that the 
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Figure 2 Inferred regulations of UKF and true regulations. 



average TP# and TN# of the UKF^i are both higher than 
those of the UKF. In addition, the average FP# and FN# 
of the UKF^i are lower than those of the UKF. Hence, the 
UKF^i predicts more correct links and nonlinks than the 
UKF. Moreover, the UKF^i produces less incorrect links 
and missed links than the UKF. The average TPR and the 
precision of the UKF^i are higher than those of the UKF. 
In addition, the average FPR of the UKF^i is lower than 
that of the UKF. Hence, by using the existing network 
information, the inference accuracy can be improved. 

The performance of UKF^i with different k is shown 
in Table 4. It is seen that the performance of UKF^i and 
UKF is close when k is small since only small regulation 
is imposed on the solution. When k is large, the differ- 
ence between the UKF^i and UKF is large. In particular, 
the UKF^i provides sparser solution than the UKF when 
k is large. It can be seen from Table 4, the average FPR 



of UKF^i decreases with the increasing of k. The average 
TPR of UKF^i, however, does not increase monotoni- 
cally with the increasing of k. The average PPR of UKF^i 
increases with the increasing of k. Hence, roughly speak- 
ing, the UKF^i is better than the UKF when large k is 
used. 

To consider the strength of the links, rather than setting 
it to 1, eij (in the indicator matrix E) is set to different val- 
ues. Large is used if the strength of the link from gene / 
to gene i is strong. For convenience, the UKF considering 
the strength of links is denoted as UKF^. To compare the 
performance of UKF^ with UKF^i, for UKF^, the values 
of the second row in Equation (79) is multiplied by 5. The 
performance of UKF^ using different k is given in Table 5. 
It can be seen from Tables 4 and 5 that the performance of 
UKF^ and UKF^i is close when k is small, e.g., k = 0.1. 
In addition, the average TPR and FPR of UKF^ is smaller 



Table 7 Comparison of \JKF p2 using different k 









True positive rate 






False positive rate 




Positive predictive rate 


Filters 




Min 


Max 


Avg 


Min 


Max 


Avg 


Min 


Max 


Avg 


UKF p2 (X 


= 0.1) 


0.7576 


0.9355 


0.8304 


0.4839 


0.6970 


0.5699 


0.5306 


0.6512 


0.5914 


UKF p2 (X 


= 0.5) 


0.6970 


0.8710 


0.7750 


0.4194 


0.5758 


0.4902 


0.5682 


0.6585 


0.6198 


UKF p2 (X 


= D 


0.6970 


0.7879 


0.7335 


0.4194 


0.5000 


0.4462 


0.5897 


0.6667 


0.6355 


UKF p2 (X 


= 5) 


0.4545 


0.6667 


0.5501 


0.3226 


0.4516 


0.3791 


0.5714 


0.6471 


0.6064 


UKF p2 (A, 


= 10) 


0.4545 


0.5455 


0.4800 


0.2903 


0.3871 


0.3523 


0.5556 


0.6538 


0.5920 
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HAP1 



CYT1 



COX5A 




Figure 3 Pathway model of the five genes in yeast protein 
synthesis network. 



than that of UKF^i for all tested k except for A. = 0.1. 
Hence, PPR is used to evaluate the performance of UKF^ 
and UKF^i. Although the average PPR of UKF pi and 
UKF^i is close when the k is large, e.g., k = 10, the average 
PPR of UKFgi is consistently higher than that of UKF^i. 
The results indicate that the inference accuracy of UKF^ 
and UKF^i are close when k is very small or very large. 
The inference accuracy of UK¥p 1 outperforms UKF^i 
when the appropriate strength of the link and parameter k 
are used. 



To consider the effect of false prior knowledge, the 
second row of the indicator matrix in Equation (79) is 
changed to [0,1,1,1,0,1,0,1], which conflicts with the 
truth. For convenience, we use UKF^i to denote the UKF 
incorporating this false prior knowledge. In Table 6, the 
performance of UKF^i with different k is shown. It can be 
seen from Tables 4 and 6 that the average TPR of UKF^i 
is smaller than that of UKF^i when k is small, e.g., k = 
0.1,0.5. In addition, the average FPR of UKF^i is larger 
than that of UKF^i when k is large, e.g., k = 5, 10. More- 
over, although the average PPR of UKF^i is close to that of 
UKF^i when k is small, the average PPR of UKF^i is con- 
sistently lower than that of UKF^i. Hence, as expected, the 
results indicate that the false prior knowledge will lead to 
worse inference result. 

Incorporating LASSO The problem setup is the same as 
before except that the LASSO rather than the existing net- 
work information is used. The UKF incorporating LASSO 
is denoted as UKF^. 

As shown in Table 3, the average TP# and FP# of UKF^ 
are lower than those of UKF and the average TN# and FN# 
of UKF^2 are higher than those of UKF. Hence, UKF^2 
produces less links, including correct and incorrect ones. 
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Figure 4 True gene expression and model output. 
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Figure 5 Variance of regulatory coefficients. 
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In addition, UK¥ p 2 produces more nonlinks and missed 
links. It is consistent with the fact that the LASSO tends to 
provide a sparse solution. It can be seen from Table 3 that 
the average FPR of UKF^ is lower than that of UKF and 
the average precision of UKF^ is higher than that of UKF. 
Hence, by incorporating LASSO, the inference accuracy is 
improved. 

A representative inference result of UKF^2 and the true 
regulations are shown in Figure 1. For comparison, the 
inference result of UKF and the true regulations are shown 
in Figure 2. By comparing Figure 2 and Figure 1, it can be 
seen that UKF falsely predicts the nonlinks from genel to 
gene2, from gene3 to gene6, from gene4 to gene8, from 
gene5 to gene2, and from gene6 to gene4 while UKF^2 
does not. 

The performance UKF^2 with different k is shown in 
Table 7. It is seen that the performance of UKF^ and 



UKF is close when k is small since only small regulation is 
imposed on the solution. When k is large, the difference 
between UK¥ p 2 and UKF is large. The average TPR and 
FPR of UKF^2 decrease with the increasing of the k. The 
average PPR does not increase monotonicallly with the 
increasing of k. Generally speaking, for different k, UKF^ 
is more sensitive than that of UKF^i. Although the perfor- 
mance of UKF^2 depends on k, the average PPR of UK¥ P 2 
is consistently higher than that of UKF. Hence, roughly 
speaking, UKF^ has better performance than UKF. 

Incorporating the range constraint The existing net- 
work information can be used to provide the rough range 
constraint of x. A tight constraint is forced on the regula- 
tion coefficient ay when there is a small regulation possi- 
bility from gene; to gene/ and a loose constraint is forced 
on the regulation coefficient with no prior information. 



Table 8 Inferred results of the UKF and UKF p2 

Filters True positives # False positives # True negatives # False negatives # 

UKF 1 7 14 3 



UKF p2 2 3 18 2 



Filters 


TPR 


FPR 


Precision 


UKF 


0.25 


0.3333 


0.1250 


UKF p2 


0.5000 


0.1429 


0.4000 
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In the simulation, for the coefficients corresponding to 
the zero elements in (79), the lower bound and the upper 
bound are set as —10 and 10, respectively. For the coef- 
ficients corresponding to the nonzero elements in (79), 
the lower bound and the upper bound are set as —0.1 
and 0.1, respectively. The UKF incorporating the range 
constraint is denoted as UKF^3. As shown in Table 3, 
the average FPR of UKF^3 is lower than that of UKF 
and the average precision of UKF^3 is higher than that 
of UKF. 

5.2 Yeast protein synthesis network 

In this section, time-series gene expression data of the 
yeast protein synthesis network is used. Five genes (HAP1, 
CYB2, CYC7,CYT1, and COX5A) of the yeast protein syn- 
thesis network are considered and 17 data points which 
can be found in [35] are collected. The regulation rela- 
tionship between them has been revealed by the biological 
experiment and shown in Figure 3. The dashed arrow in 
Figure 3 denotes repression and the solid arrow denotes 
activation.' 

The GRN is inferred by the UKF and UKF^2- The pre- 
dicted gene expressions using parameters estimated by 
UKF^2 and the true measured gene expressions are shown 
in Figure 4. It can be seen that the model output fits the 
measured data well. The variances of the regulatory coef- 
ficients of HAP1 (Pi/(1 < i < 5)) are shown in Figure 5. 
It can be seen that the filter converges since the vari- 
ance Pn approaches zero. The results for other regulatory 
coefficients are similar and not shown here. The evalua- 
tion of the inferred GRN by UKF and UKF^2 is shown in 
Table 8. 

By incorporating the sparsity constraint, UKF^ pro- 
vides much better inference results than UKF. As shown in 
Table 8, the TP# and TN# of UKF^ 2 are higher than those 
of UKF and the FP# and FN# are lower than those of UKF. 
In addition, it can be seen from Table 8, the FPR of UKF^2 
is lower than that of UKF and the TPR and the precision 
of UKF^ 2 is higher than that of UKF. 

6 Conclusions 

In this paper, we have proposed a framework of employ- 
ing the point-based Gaussian approximation filters which 
incorporates the prior knowledge to infer the gene regu- 
latory network (GRN) based on the gene expression data. 
The performance of the proposed framework is tested 
by a synthetic network and the yeast protein synthesis 
network. Numerical results show that the inference accu- 
racy of the GRN by the proposed point-based Gaussian 
approximation filter incorporating the prior information 
is higher than using the traditional filters without incor- 
porating prior knowledge. The proposed method works 
for small- and medium-size GRNs due to the compu- 
tational complexity considerations. It remains a future 



research topic how to adapt the proposed inference frame- 
work to handle large GRNs at reasonable computational 
complexity. 
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